Nested word

In computer science, more specifically in automata and formal language theory, nested words are a concept proposed by Alur and Madhusudan as a joint generalization of words, as traditionally used for modelling linearly ordered structures, and of ordered unranked trees, as traditionally used for modelling hierarchical structures. Finite-state acceptors for nested words, so-called nested word automata, then give a more expressive generalization of finite automata on words. The linear encodings of languages accepted by finite nested word automata gives the class of visibly pushdown languages. The latter language class lies properly between the regular languages and the deterministic context-free languages. Since their introduction in 2004, these concepts have triggered much research in that area.[1]

Contents

Formal definition

To define nested words, we first need to define matching relation. As usual, for a nonnegative integer \ell, we use the notation [\ell] to denote the set \{1,2,\ldots,\ell-1,\ell\}, with the special case [0]=\emptyset.

A matching relation ↝ of length \ell\ge 0 is a subset of \{-\infty, 1,2,\ldots,\ell-1,\ell\}\times\{1,2,\ldots,\ell-1,\ell,\infty\} such that:
(i) all nesting edges are forward, that is, if i ↝ j then i<j;
(ii) nesting edges never have a finite position in common, that is, for -\infty < i < \infty , there is at most one position h such that h ↝ i, and there is at most one position j such that i ↝ j; and
(iii) nesting edges never cross, that is, we can't find i<i'≤j<j' such that both i ↝ j and i' ↝ j'.
i is referred to as a call position, if i ↝ j for some j, as a pending call if i ↝ ∞, as a return position, if h ↝ i for some h and as a "pending return" if -∞ ↝ i.

A nested word of length \ell over an alphabet Σ is a pair (w,↝), where w is a word of length \ellover Σ (in the usual sense) and ↝ is a matching relation of length \ell.

Encoding nested words into ordinary words

Nested words over the alphabet \Sigma=\{a_1,a_2,\ldots,a_n\} can be encoded into "ordinary" words over the tagged alphabet  \hat{\Sigma}, in which each symbol a from Σ has three tagged counterparts: the symbol ⟨a for encoding a call position in a nested word labelled with a, the symbol a⟩ for encoding a return position labelled with a, and finally the symbol a itself for representing an internal position labelled with a. More precisely, let φ be the function mapping nested words over Σ to words over \hat{\Sigma} such that each nested word (w_1w_2\cdots w_\ell,↝) is mapped to the word x_1x_2...x_\ell, where the letter x_i equals ⟨a, a, or a⟩, respectively, if w_i=a and i is a call position, an internal position, or a return position, respectively.

Example

For illustration, let n=(w,↝) be the nested word over an ternary alphabet with w=abaabccca and matching relation ↝ = {(-∞,1),(2,∞),(3,4),(5,7),(8,∞)}. Then its encoding as word reads as φ(n) = a⟩⟨b⟨aa⟩⟨bcc⟩⟨ca.

Automata

Nested word automaton

A nested word automaton has a finite number of states, and operates in almost the same way as a deterministic finite automaton on classical strings: a classical finite automaton reads the input word w = w_1\cdots w_\ell from left to right, and the state of the automaton after reading the jth letter w_j depends on the state in which the automaton was before reading w_j.

In a nested word automaton, the position j in the nested word (w,↝) might be a return position; if so, the state after reading w_j will not only depend on the linear state in which the automaton was before reading w_j, but also on a hierarchical state propagated by the automaton at the time it was in the corresponding call position. In analogy to regular languages of words, a set L of nested words is called regular if it is accepted by some (finite-state) nested word automaton.

Visibly pushdown automaton

Nested word automata are an automaton model accepting nested words. There is an equivalent automaton model operating on (ordinary) words. Namely, the notion of a deterministic visibly pushdown automaton is a restriction of the notion of a deterministic pushdown automaton.

Following Alur and Madhusudan,[2] a deterministic visibly pushdown automaton is formally defined as a 6-tuple M=(Q,\  \hat{\Sigma},\  \Gamma,\  \delta,\ q_0, \ F) where

The notion of computation of a visibly pushdown automaton is a restriction of the one used for pushdown automata. Visibly pushdown automata only add a symbol to the stack when reading a call symbol a_c\in \Sigma_c, they only remove the top element from the stack when reading a return symbol a_r\in\Sigma_r and they do not alter the stack when reading an internal event a_i\in\Sigma_{int}. A computation ending in an accepting state is an accepting computation.

As a result, a visibly pushdown automaton cannot push to and pop from the stack with the same input symbol. Thus the language L=\{a^nba^n \mid n\in\mathrm{N} \} cannot be accepted by a visibly pushdown automaton for any partition of \Sigma, however there are pushdown automata accepting this language.

If a language L over a tagged alphabet \,\hat{\Sigma} is accepted by a deterministic visibly pushdown automaton, then L is called a visibly pushdown language.

Nondeterministic visibly pushdown automata

Nondeterministic visibly pushdown automata are as expressive as deterministic ones. Hence one can transform a nondeterministic visibly pushdown automaton into a deterministic one, but if the nondeterministic automaton had s states, the deterministic one may have up to 2^{s^2} states.[3]

Decision problems

Let |A| be the size of the description of an automaton A, then it is possible to check if a word n is accepted by the automaton in time O(|A|^3\ell). In particular, the emptiness problem is solvable in time O(|A|^3). If A is fixed, it is decidable in time O(\ell) and space O(d) where d is the depth of n in a streaming seeing. It is also decidable with space O(\log(\ell)) and time O(\ell^2 \log (\ell)), and by a uniform boolean circuit of depth O(\log l).[2]

For two nondeterministic automata A and B, deciding whether the set of words accepted by A is a subset of the word accepted by B is EXPTIME-complete. It is also EXPTIME-complete to figure out if there is a word that is not accepted.[2]

Languages

As the definition of visibly pushdown automata shows, deterministic visibly pushdown automata can be seen as a special case of deterministic pushdown automata; thus the set VPL of visibly pushdown languages over \,\hat{\Sigma} forms a subset of the set DCFL of deterministic context-free languages over the set of symbols in \,\hat{\Sigma}. In particular, the function that removes matching relation from nested words transforms regular languages over nested words into context-free languages.

Closure properties

The set of visibly pushdown languages is closed under the following operations:[3]

For the intersection operation, one can construct a VPA M simulating two given VPAs M_1 and M_2 by a simple product construction (Alur & Madhusudan 2004): For i=1,2, assume M_i is given as (Q_i,\  \hat{\Sigma},\  \Gamma_i,\  \delta_i, \ s_{i},\ Z_i, \ F_i). Then for the automaton M, the set of states is \, Q_1\times Q_2, the initial state is \left(s_{1}, s_2\right), the set of final states is F_1 \times F_2, the stack alphabet is given by \,\Gamma_1\times\Gamma_2, and the initial stack symbol is (Z_1,Z_2).

If M is in state (p_1,p_2) on reading a call symbol \left\langle a\right., then M pushes the stack symbol (\gamma_1,\gamma_2) and goes to state (q_1,q_2), where \gamma_i is the stack symbol pushed by M_i when transitioning from state p_i to q_i on reading input \left\langle a\right..

If M is in state (p_1,p_2) on reading an internal symbol a, then M goes to state (q_1,q_2), whenever M_i transitions from state p_i to q_i on reading a.

If M is in state (p_1,p_2) on reading a return symbol \left. a\right\rangle, then M pops the symbol (\gamma_1,\gamma_2) from the stack and goes to state (q_1,q_2), where \gamma_i is the stack symbol popped by M_i when transitioning from state p_i to q_i on reading \left. a\right\rangle.

Correctness of the above construction crucially relies on the fact that the push and pop actions of the simulated machines M_1 and M_2 are synchronized along the input symbols read. In fact, a similar simulation is no longer possible for deterministic pushdown automata, as the larger class of deterministic context-free languages is no longer closed under intersection.

In contrast to the construction for concatenation shown above, the complementation construction for visibly pushdown automata parallels the standard construction[4] for deterministic pushdown automata.

Moreover, the class of visibly pushdown languages is closed under

Relation to other language classes

Alur & Madhusudan (2004) point out that the visibly pushdown languages are more general than the parenthesis languages suggested in McNaughton (1967). As shown by Reghizzi & Mandrioli (2009), the VPL in turn are strictly contained in the class of languages described by operator-precedence grammars, which were introduced by Floyd (1963). In comparison to conjunctive grammars, a generalization of context-free grammars, Okhotin (2011) shows that the linear conjunctive languages form a superclass of the visibly pushdown languages. The following table puts the family of visibly pushdown languages in relation to other language families in the Chomsky hierarchy.

Other models of description

Visibly pushdown grammars

Visibly pushdown languages are exactly the languages that can be described by visibly pushdown grammars.[2]

Visibly pushdown grammars can be defined as a restriction of context-free grammars. A visibly pushdown grammars G is defined by the 4-tuple:

G = (V=V^0\cup V^1\,, \Sigma\,, R\,, S\,) where

Here, the asterisk represents the Kleene star operation and \epsilon is the empty word.

Uniform Boolean circuits

The problem whether a word n is accepted by a given nested word automaton can be solved by uniform Boolean circuits of depth O(\log(\ell)).[2]

Logical description

Regular languages over nested words are exactly the set of languages described by Monadic SO_(complexity) with two unary predicates call and return, linear successor and the matching relation ↝.[2]

Notes

  1. ^ Google Scholar search results for "nested words" OR "visibly pushdown"
  2. ^ a b c d e f Alur & Madhusudan (2009)
  3. ^ a b Alur & Madhusudan (2004)
  4. ^ Hopcroft & Ullman (1979), p. 238 f.

See also

Model checking

References

External links